European Organisation
نویسندگان
چکیده
Multimodal interfaces have for quite some time been considered the "interfaces of the future", aiming to allow more natural interaction and offering new opportunities for parallelism and individual capacity increases. This document provides the reader with an overview of multimodal interfaces and of the results of empirical studies assessing users' performance with multimodal systems. The study tackles applications from various domains, including air traffic control. The document also discusses a number of limitations of current multimodal interfaces, particularly for the ATC domain, for which a deeper analysis of the costs and benefits associated with multimodal interfaces should be carried out. Multimodal Interfaces: a Brief Literature Review EUROCONTROL Project MMF – EEC Note No. 01/07 v FOREWORD Recent ATC working-position interfaces would appear to have converged on a mouse-windows paradigm for all human-system interactions. Is there no future for any of the other futuristic devices we see in virtual and augmented reality applications, such as wands, speech input and haptics? Could these devices not be more natural channels for interaction? Could they not offer opportunities for parallelism and thus for individual capacity increases? In order to start studying these issues and to draw a more informed picture of the pros and cons of multimodal interaction, we searched the literature for generic lessons from multimodal interfaces which would be transferable to the ATM domain. The report is supplemented with a number of landmark experiments in ATM itself. Marc Bourgois, Manager Innovative Studies EUROCONTROL Multimodal Interfaces: a Brief Literature Review vi Project MMF – EEC Note No. 01/07 TABLE OF CONTENTS FOREWORD ...................................................................................................................... V LIST OF FIGURES ........................................................................................................... VII LIST OF TABLES............................................................................................................. VII 1. PURPOSES OF THIS DOCUMENT..............................................................................1 2. DEFINING MULTIMODALITY.......................................................................................2 2.1. FIRST DEFINITION........................................................................................................ 2 2.2. WHY MULTIMODAL INTERFACES?............................................................................. 2 3. MYTHS OF MULTIMODAL INTERACTION .................................................................3 3.1. HUMAN-COMPUTER COMMUNICATION CHANNELS (HCCC) ...................................4 3.2. HUMAN-CENTRED PERSPECTIVE...............................................................................5 3.3. SYSTEM-CENTRED PERSPECTIVE.............................................................................6 3.4. DESIGN SPACE FOR MULTIMODAL SYSTEMS ..........................................................6 3.5. AN EXAMPLE OF CLASSIFICATION.............................................................................7 4. DEVICES: A SHORT SUMMARY.................................................................................9 4.1. INPUT............................................................................................................................. 9 4.2. OUTPUT......................................................................................................................... 9 5. NON-ATM DOMAIN – EMPIRICAL RESULTS...........................................................11 5.1. DO PEOPLE INTERACT MULTIMODALLY ?...............................................................11 5.2. MULTIMODALITY AND TASK DIFFICULTY.................................................................12 5.3. MUTUAL DISAMBIGUATION........................................................................................12 5.4. MEMORY ......................................................................................................................13 5.5. THE MCGURK EFFECT ...............................................................................................14 5.6. CROSS-MODAL INTERACTIONS................................................................................14 6. ATM MULTIMODAL INTERFACES............................................................................16 6.1. DIGISTRIPS ................................................................................................................. 16 6.1.1. DigiStrips evaluation........................................................................................18 6.2. THE ANOTO PEN ........................................................................................................ 18 6.3. VIGIESTRIPS............................................................................................................... 19 6.3.1. Vigiestrips evaluation.......................................................................................20 6.4. 3D SEMI-IMMERSIVE ATM ENVIRONMENT (LINKÖPING UNIVERSITET).............. 21 6.4.1. Flight information .............................................................................................22 6.4.2. Weather information ........................................................................................22 6.4.3. Terrain information ..........................................................................................22 6.4.4. Orientation .......................................................................................................22 6.4.5. Conflict detection .............................................................................................22 6.4.6. Control .............................................................................................................23 6.4.7. Positional audio ...............................................................................................23 6.4.8. Interaction mechanisms...................................................................................23 6.4.9. Evaluation........................................................................................................23 6.5. AVITRACK ................................................................................................................... 25 6.5.1. AVITRACK evaluation .....................................................................................27 Multimodal Interfaces: a Brief Literature Review EUROCONTROL Project MMF – EEC Note No. 01/07 vii 7. CONCLUSIONS..........................................................................................................28 8. ACKNOWLEDGMENTS .............................................................................................28 9. REFERENCES............................................................................................................29 LIST OF FIGURES Figure 1: Basic model for HCCC (adapted from Schomaker et al., 1995)..................................... 5 Figure 2: Design space (after Nigay & Coutaz, 1993) ................................................................... 6 Figure 3: The NoteBook example within the design space (after Nigay & Coutaz, 1993) ............. 8 Figure 4: The manipulation properties preserved in DigiStrips (after Mertz & Vinot, 1999)......... 16 Figure 5: DigiStrips strokes (after Mertz, Chatty & Vinot, 2000a) ................................................ 17 Figure 6: Simple strokes to open menus (after Mertz, Chatty & Vinot, 2000a)............................ 17 Figure 7: An annotated strip (after Mertz & Vinot, 1999) ............................................................. 18 Figure 8: The ANOTO pen – technical limitations ....................................................................... 18 Figure 9: Vigiestrips – main working areas (after Salaun, Pene, Garron, Journet & ...................... Pavet, 2005) ................................................................................................................. 20 Figure 10: Snapshot of the 3D ATM system (after Lange, Cooper, Ynnerman & Duong, 2004)... 21 Figure 11: Interaction usability test (after Le-Hong, Tavanti & Dang, 2004).................................. 24 Figure 12: AVITRACK overview (adapted from www.avitrack.net)................................................ 26 LIST OF TABLES Table 1: Classification of senses and modalities (adapted from Silbernagel, 1979) .................... 5 EUROCONTROL Multimodal Interfaces: a Brief Literature Review viii Project MMF – EEC Note No. 01/07 Page intentionally left blank Multimodal Interfaces: a Brief Literature Review EUROCONTROL Project MMF – EEC Note No. 01/07 1 1. PURPOSES OF THIS DOCUMENT This document provides an overview of multimodal interfaces and of the results of empirical studies assessing users' performance with multimodal systems. It is structured as follows: In section 2, the notion of multimodality will be defined and explained. In section 3, the main ideas in support of multimodal interfaces (as well as the "false myths" related to multimodal systems) will be presented. Subsequently, a number of models will also be proposed attempting to classify and structure the available research on multimodal systems. In section 4, a brief overview will be given of the available devices most commonly used in multimodal research and of a number of strong and weak points for each technology. In sections 5 and 6, a collection of empirical results the aim of which is to assess users' performance with multimodal systems will be provided. As the multimodal literature is not very large with respect to air traffic control (ATC), this document will present both non-ATC and ATCrelated results. The last section will sum up the work and put forward a number of criticisms and possible limitations of multimodal interfaces, particularly for the ATC domain. This document and the preliminary conclusions call for an analysis of the costs and benefits associated with multimodal interfaces, especially for ATC. EUROCONTROL Multimodal Interfaces: a Brief Literature Review Project MMF – EEC Note No. 01/07 2 2. DEFINING MULTIMODALITY 2.1. FIRST DEFINITION Multimodal systems "process two or more combined user input modes – such as speech, pen, touch, manual gestures, gaze and head and body movements – in a coordinated manner with multimedia system output" (Oviatt, 2002). These interfaces are different from traditional graphical interfaces since they aim "to recognize naturally occurring forms of human language and behaviour, which incorporate at least one recognition-based technology" (Oviatt, 2002). 2.2. WHY MULTIMODAL INTERFACES? Historically, the birth of multimodal interfaces is often identified with the "Put That There" system (Bolt, 1980), in which both speech and gesture are recognised by the system and interpreted as command inputs. The user, sitting in a room in front of a large display, can provide vocal inputs accompanied by deictic gestures which contribute to the identification of an object (Robin, 2004). Therefore, the user can give the command "Put That There" while pointing at an object. Bolt states: "there, now indicated by gesture, serves in lieu of the entire phrase ". . .to the right of the green square". The power of this function is even more general. The place description ". . . to the right of the green square" presupposes an item in the vicinity for reference: namely, the green square. There may be no plausible reference frame in terms of already extant items for a word description of where the moved item is to go. The intended spot, however, may readily be indicated by voiceand-pointing: there. In this function, as well as others, some variation in expression is understandably a valuable option" (Bolt, 1980). The interesting point is that Bolt's system is able to disambiguate an unclear and vague noun – "there" – by interpreting the meaning of the pointing gestures. The great innovation of Bolt's system was to allow the users to carry out a task using a more natural way of interacting with the system, exploiting everyday communication strategies such as pointing to objects. Multimodal interfaces are made with the objective of providing flexible interaction, since the users can choose from among a selection of input modalities, using one input type, using multiple simultaneous input modalities, or alternating among different modalities. Multimodal interfaces are also characterised by availability, because they are intended to accommodate several user types under multiple circumstances, making available a number of interaction modalities. Moreover, the possibility of using several interaction modalities characterises these interfaces in terms of adaptability, since the user can choose the most appropriate modality depending on the circumstances, and in terms of efficiency, because the interfaces can process inputs in a parallel manner (Robbins, 2004). Finally, as suggested by the example "Put-That-There", multimodal interfaces aim to support more natural human-computer interaction. Multimodal interfaces are based on the recognition of natural human behaviour such as speech, gestures, and gaze; new computational capabilities will eventually allow for automatic and seamless interpretation of these behaviours so that the systems will intelligently adapt and respond to the users (Oviatt, 2002). The main benefit would appear to be "natural interaction", a way of carrying out tasks with systems which are able to grasp and understand the way we behave in everyday life. However, the design of multimodal interfaces is often based on common-sense and false assumptions, constituting a sort of misleading mythology. Multimodal Interfaces: a Brief Literature Review EUROCONTROL Project MMF – EEC Note No. 01/07 3 3. MYTHS OF MULTIMODAL INTERACTION There are great expectations surrounding multimodal interfaces and multimodal interaction. Often, these expectations lead to mistaken beliefs, which have little or nothing to do with the actual "empirical reality" (Oviatt, 2002). Oviatt (1999) summarises these false beliefs based on empirical evidence. She provides a list of ten myths of multimodal interaction and explains how these myths should be "corrected" in order to meet real users' requirements. Myth No. 1: If you build a multimodal system, the user will interact multimodally. According to a study carried out by Oviatt (1997), 95-100% of users interacted multimodally when they were free to use either speech or pen input in a spatial domain. Users would appear to mix unimodal and multimodal interaction, depending on the requirements of the task at hand. Multimodal interactions would appear related to spatial content (e.g. calculation of the distance between objects, specification of distances among objects, etc.). When the action does not entail spatiality, users are not likely to interact multimodally. Myth No. 2: Speech-and-pointing is the dominant multimodal integration pattern. This type of interaction seems dominant because most interfaces implement this type of interaction modality, especially to resolve "deictic" forms (i.e. to resolve the meaning of expressions such as "that" or "there", which require a reference to something). However, this choice would appear to be a sort of "new implementation style" for the traditional mouse interaction paradigm. Speak-andpoint interaction accounts for only 14% of all spontaneous multimodal interactions (Oviatt, DeAngeli & Kuhn, 1997). Oviatt mentions the results of past research (McNeill, 1992) indicating that analysis of interpersonal communications shows that pointing accounts for less than 20%. Myth No. 3: Multimodal input involves simultaneous signals. It is often assumed that users "act multimodally", using different modalities in a simultaneous manner. Taking deictic expressions as an example, one might think that users would speak while simultaneously pointing at something and saying, for example, "there". This overlapping is yet another myth: only 25% of the overlap between deictic expressions and pointing was present in the empirical study of Oviatt et al. (1997). In actual fact, gesturing would often appear to precede spoken inputs. The presence of a degree of synchronisation between signals should not be misunderstood, since synchronisation is not co-occurrence. Myth No. 4: Speech is the primary input mode in any multimodal system that includes it. Another commonplace is that speech is a form of primary mode. Thus, the presence of different input modalities should be considered as a sort of compensation, redundant modes which can "take over", especially if the primary mode (i.e. speech) is degraded. This myth should remain just that – a myth. In fact, there are modes which can convey information not efficiently conveyed by speech (e.g. spatial information). Moreover, as previously stated, different modalities are used in a very articulated and not necessarily redundant manner; for example, many gesture signals precede speech. Myth No. 5: Multimodal language does not differ linguistically from unimodal language. According to Oviatt (1999), every language has its own peculiarity. For example, pen/voice language seems more brief and syntactically simpler than unimodal speech. When the users are free to interact using the preferred modality of their choice, they are likely to selectively avoid linguistic complexities. Myth No. 6: Multimodal integration involves redundancy of content between modes. EUROCONTROL Multimodal Interfaces: a Brief Literature Review Project MMF – EEC Note No. 01/07 4 Multimodal communication can be considered as a means of "putting together" content in a complementary manner rather than redundantly. Different modes contribute to different and complementary information. As stated above, locative information is often written, while subjectverb-object information is more likely to be spoken. Multiple communication modes do not imply duplicate information. Myth No. 7: Individual error-prone recognition technologies combine multimodally to produce even greater unreliability. In general it is thought that when using error-prone recognition technologies (such as speech and pen-input recognition) many composite errors will be produced. In fact, multimodal systems are reasonably robust to errors. Users naturally know when and how to use a given input mode (instead of another) in the most efficient manner. They are likely to deploy the most effective mode and avoid using the more error-prone input. Oviatt (1999) also speaks of mutual disambiguation of two input signals, i.e. the recovery of errors thanks to the interpretation of two input signals. For example, if the system recognises not only the word "ditch" but also a number of parallel graphic marks, then the word "ditch" will be interpreted as "ditches". Myth No. 8: All users' multimodal commands are integrated in a uniform way. Users are characterised by individual differences and deploy different strategies while interacting in accordance with their preferences. Multimodal systems, then, should detect these differences and adapt to the users' dominant integration patterns. Myth No. 9: Different input modes are capable of transmitting comparable content. From a technology-oriented perspective, the various modes might appear to be interchangeable, able to efficiently transmit comparable content. This is not the case. Every mode is unique; the type of information transmitted, the way it is transmitted, and the functionality of each mode during communication is specific. Myth No. 10: Enhanced efficiency is the main advantage of multimodal systems. It has not been demonstrated that efficiency is a substantial gain, unless restricted to spatial domains, i.e. during multimodal-pen interaction in a spatial domain a 10% speed-up gain was obtained in comparison to a simple speech-only interface (Oviatt, 1997). Apart from efficiency, however, there are other substantial advantages characterising multimodal interfaces. For example, such interfaces are more flexible (the users can switch among modalities and make choices), multimodal interfaces can accommodate a wider range of users and tasks than unimodal interfaces, etc. 3.1.1. Human-computer communication channels (HCCC) When speaking of multimodal interfaces, it is important to define some basic notions concerning the input/output (I/O) modalities implied in the interaction between the interfaces (or, in a more general sense, computers) and humans. According to Schomaker et al. (1995) we can identify four I/O channels (cf. Figure 1). The HOC and CIM describe the input, while COM and HIC define the output (or feedback). Multimodal Interfaces: a Brief Literature Review EUROCONTROL Project MMF – EEC Note No. 01/07 5 Figure 1: Basic model for HCCC (adapted from Schomaker et al., 1995) There are two processes involved in interaction: perception (entailing human input and computer output) and control (entailing human output and computer input). Information flow can be defined as the sum of cross-talking perception and control communication channels; however, the complexity of human information processing channels hinders the elaboration of a model describing multimodal integration, and the improvements in multimodal interface design can only be supported by empirical results (Popescu, Burdea & Trefftz, 2002). This model introduces interaction and communication flows between humans and computers. In fact, there are some attempts in the literature to provide an "organised vision" of multimodal interfaces, focusing either on the human or on the system. 3.1.2. Human-centred perspective Raisamo (1999) provides a classification of two different perspectives that seem to guide the development of multimodal interaction. The first view is human-centred. It builds on the idea that modality is closely related to the human senses. Raisamo makes reference to the classification by Silbernagel (1979), in which the senses and their corresponding modalities are listed. Table 1: Classification of senses and modalities (adapted from Silbernagel, 1979) Sensory perception Sensory organ Modality Sense of sight Eyes Visual Sense of hearing Ears Auditive Sense of touch Skin Tactile Sense of smell Nose Olfactory Sense of taste Tongue Gustatory Sense of balance Organ of equilibrium Vestibular I t ti i f ti fl Computer
منابع مشابه
The European Stroke Organisation Guidelines: a standard operating procedure.
In 2008, the recently founded European Stroke Organisation published its guidelines for the management of ischemic stroke and transient ischemic attack. This highly cited document was translated in several languages and was updated in 2009. Since then, the European Stroke Organisation has published guidelines for the management of intracranial aneurysms and subarachnoidal hemorrhage, for the es...
متن کاملCervical cancer screening in Europe: Quality assurance and organisation of programmes.
BACKGROUND Cervical screening programmes have reduced cervical cancer incidence and mortality but the level of success is highly variable between countries. Organisation of programmes is essential for equity and cost-effectiveness. However, there are differences in effectiveness, also among organised programmes. In order to identify the key organisational components that determine effectiveness...
متن کاملEUROPEAN COMMISSION FORWARD STUDIES UNIT Theories of Industrial Organisation and Competition Policy: What are the Links ?
.........................................................................................................................7 I. Industrial Organisation and the characterisation of a market ...................................9 II. The use of industrial organisation in European competition policy: some illustrations. .....................................................................................
متن کاملAdvances and technical standards in neurosurgery. Vol 25
advances and technical standards in neurosurgery vol 38 advances and technical standards in neurosurgery advances in stereotactic and functional neurosurgery head injuries tumors of the crebellar region advances in neurosurgery towards minimally invasive neurosurgery 1st asian congreB of stereotactic functional and computeraBisted neurosurgery singapore december 1994 abstracts Petronas Technica...
متن کاملTwelfth Congress of European Chemoreception Research Organisation ( ECRO XII )
Twelfth Congress of European Chemoreception Research Organisation (ECRO XII) Held in Zurich, Switzerland, August 25-31, 1996 1. Receptors and signalling pathways in olfactory neurons
متن کاملOrganisation of Provision to Support Inclusive Education
The European Agency for Development in Special Needs Education (the Agency) is an independent and self-governing organisation, supported by Agency member countries and the European Institutions (Commission and Parliament). This publication reflects the views only of the Agency and the Commission cannot be held responsible for any use which may be made of the information contained therein.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007